teacher feedback
FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring
Seo, Hyein, Hwang, Taewook, Lee, Yohan, Jung, sangkeun
In English education tutoring, teacher feedback is essential for guiding students. Recently, AI-based tutoring systems have emerged to assist teachers; however, these systems require high-quality and large-scale teacher feedback data, which is both time-consuming and costly to generate manually. In this study, we propose FEAT, a cost-effective framework for generating teacher feedback, and have constructed three complementary datasets: (1) DIRECT-Manual (DM), where both humans and large language models (LLMs) collaboratively generate high-quality teacher feedback, albeit at a higher cost; (2) DIRECT-Generated (DG), an LLM-only generated, cost-effective dataset with lower quality;, and (3) DIRECT-Augmented (DA), primarily based on DG with a small portion of DM added to enhance quality while maintaining cost-efficiency. Experimental results showed that incorporating a small portion of DM (5-10%) into DG leads to superior performance compared to using 100% DM alone.
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
Zhou, Xiaofeng, Huang, Heyan, Liao, Lizi
Large Language Models (LLMs) continue to set new standards in knowledge-intensive and complex reasoning tasks, yet their high computational demands limit widespread adoption. While distilling large models into smaller ones offers a sustainable solution, current techniques--such as static knowledge distillation, resource-intensive reinforcement learning from human feedback, or limited self-reflection--struggle to yield substantial and lasting performance gains. In this paper, we present a novel Debate and Reflect (D&R) framework that orchestrates multi-turn debates between smaller models and stronger teacher models, eliciting actionable feedback (e.g., error analysis, corrective strategies) to guide student models. Further, we introduce Tree-structured Direct Preference Optimization (T-DPO) to efficiently leverage these debate logs, organizing interactions into a hierarchical format for effective training. Empirical evaluations across diverse NLP benchmarks demonstrate that our approach significantly improves smaller-model accuracy, robustness, and generalization, outperforming conventional baselines by a large margin.
Human AI interaction loop training: New approach for interactive reinforcement learning
Reinforcement Learning (RL) in various decision-making tasks of machine learning provides effective results with an agent learning from a stand-alone reward function. However, it presents unique challenges with large amounts of environment states and action spaces, as well as in the determination of rewards. This complexity, coming from high dimensionality and continuousness of the environments considered herein, calls for a large number of learning trials to learn about the environment through Reinforcement Learning. Imitation Learning (IL) offers a promising solution for those challenges using a teacher. In IL, the learning process can take advantage of human-sourced assistance and/or control over the agent and environment. A human teacher and an agent learner are considered in this study. The teacher takes part in the agent training towards dealing with the environment, tackling a specific objective, and achieving a predefined goal. Within that paradigm, however, existing IL approaches have the drawback of expecting extensive demonstration information in long-horizon problems. This paper proposes a novel approach combining IL with different types of RL methods, namely state action reward state action (SARSA) and asynchronous advantage actor-critic (A3C) agents, to overcome the problems of both stand-alone systems. It is addressed how to effectively leverage the teacher feedback, be it direct binary or indirect detailed for the agent learner to learn sequential decision-making policies. The results of this study on various OpenAI Gym environments show that this algorithmic method can be incorporated with different combinations, significantly decreases both human endeavor and tedious exploration process.
A Survey of Automated Programming Hint Generation -- The HINTS Framework
McBroom, Jessica, Koprinska, Irena, Yacef, Kalina
Automated tutoring systems offer the flexibility and scalability necessary to facilitate the provision of high quality and universally accessible programming education. In order to realise the full potential of these systems, recent work has proposed a diverse range of techniques for automatically generating hints to assist students with programming exercises. This paper integrates these apparently disparate approaches into a coherent whole. Specifically, it emphasises that all hint techniques can be understood as a series of simpler components with similar properties. Using this insight, it presents a simple framework for describing such techniques, the Hint Iteration by Narrow-down and Transformation Steps (HINTS) framework, and it surveys recent work in the context of this framework. It discusses important implications of the survey and framework, including the need to further develop evaluation methods and the importance of considering hint technique components when designing, communicating and evaluating hint systems. Ultimately, this paper is designed to facilitate future opportunities for the development, extension and comparison of automated programming hint techniques in order to maximise their educational potential.